Translation Pattern Extraction and Recombination for Example-Based Machine Translation
نویسنده
چکیده
An approach to Example-Based Machine Translation is presented which operates by extracting and recombining translation patterns from a bilingual corpus aligned at the level of the sentence. The translation patterns are extracted using a recursive machinelearning algorithm based on the principle of similar distributions of strings: source and target language lexical items that co-occur in the same two sentence-pairs are likely to be translations of each other. The translation patterns extracted represent generalisations of sentences that are translations of each other in that certain sequences of words are replaced by variables. The translation patterns resemble, to a certain extent, transfer rules but with less constraints since there is no concept of syntactic structure in this approach: translation patterns are extracted based on the information inherent in the corpus. The strings and variables of which the translation patterns are composed are aligned in order to provide a more refined bilingual knowledge source, necessary for the recombination phase where target language translations are produced. They are aligned by means of language-neutral techniques – cognates and bilingual lexical distribution – in order to maintain language-neutrality and portability. A non-structural approach based on the distributions of surface forms in a corpus is error prone and liable to the extraction of translation patterns that are false translations. This thesis highlights some of the sources of those errors and proposes solutions to them based on the addition of external linguistic knowledge. First, the basic approach is augmented with morphological analysis. Second, part-of-speech tagging is incorporated. This results in three variants of the approach, each with increasing amounts of external linguistic resources included. The performance of each variant is assessed and a comparison is made between them in order to test the hypothesis that an increase in linguistic knowledge improves performance, in terms of both recall and precision.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملMetaMorpho: A Pattern-Based Machine Translation System
This paper describes an efficient real-time comprehension assistance and machine translation method. Combining the advantages of example-based (EBMT) and rule-based machine translation (RBMT), a new paradigm, pattern-based translation is presented. A system based on these principles that features an innovative user-friendly interface has been built. Called MetaMorpho, the system has been tested...
متن کاملLanguage Resources For The Semantic Web: Perspectives For Machine Translation
In this paper we present a possible solution for improving the quality of on-line translation systems, using mechanisms and standards from Semantic Web. We focus on Example based machine translation and the automatization of the translation examples extraction by means of RDFrepositories.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001